feat: Improve search indexing reliability #1065

MonkeyDo · 2024-02-16T12:31:57Z

This PR came about after #740 started causing the website to crash when editing/creating Work entities.

I took this opportunity to make the search indexing more reliable by improving error catching.

While rewriting some of that code, I realized we are indexing a loooot of fields that we have no reason to index.
First of all, some internal ORM fields that are output by default when serializing to JSON.
Then the entire entity info is stored inn ElasticSearch, while on the other hand we load entities afresh from the DB based on the search results hits, which means at the moment we do absolutely nothing with all this entity info other than take disk space, slow down indexing and transferring too much data.

So we are now "cleaning" the documents before indexing them, i.e. keeping only the fields we need.
In the future, we will want to store more information and return it directly instead of fetching from the DB for presentation, but that will require more rewriting.
If we are planning on moving to SOLR soon enough that seems like a possible waste of time. A good candidate for a part 2 in any case.

Also removed some manipulation we did to shoehorn collection, editor and area ids into a "bbid" field.
Now supports "id" field as well.

Finally, since we are now passing ORM models over for indexing, some rewriting was necessary wherever we call the search indexing, and I took that opportunity to rewrite some messy chained promises to async/await syntax.

We have been storing a ton of information in the search index that we just won't ever use such as set ids and internal props from the ORM Model (_pivot…). Instead, let's pass the ORM models along and create a new utility to strip the dcument to index down to what we actually need.

With the change from the previous commit (accepting an ORM model rather than JSON for search indexing), we need to rewrite accordingly the parts of the code that use the search indexing. Taking this opportunity to rewrite some code from promises to async/await syntax.

Missed some places where we need to set attributes on the ORM models for editor, collections and other non-entity types. Cheeky async/await rewrite to clarify some of it.

Not all entities have a BBID field now, some have "id" instead

Instead of JSON representation. See #1065

MonkeyDo and others added 9 commits February 7, 2024 17:51

search: Omit some fields from object to index

2ff7709

search: catch more error scenarios while indexing

d141020

chore(search): Make search tools a Typescript file

ac497d3

search: Add defaultAlias back for indexing work author

9a12fdf

feat(search): Finish adapting indexing mechanism

51af091

Missed some places where we need to set attributes on the ORM models for editor, collections and other non-entity types. Cheeky async/await rewrite to clarify some of it.

chore(search): Allow for id field in results

a3360ff

Not all entities have a BBID field now, some have "id" instead

Merge branch 'master' into search-indexing-issues

311ea9b

MonkeyDo merged commit 2622d7c into master Feb 16, 2024
4 checks passed

MonkeyDo deleted the search-indexing-issues branch February 16, 2024 13:04

MonkeyDo added a commit that referenced this pull request Jun 4, 2024

Send entity model for indexing

dac72ad

Instead of JSON representation. See #1065

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Improve search indexing reliability #1065

feat: Improve search indexing reliability #1065

MonkeyDo commented Feb 16, 2024 •

edited

Loading

feat: Improve search indexing reliability #1065

feat: Improve search indexing reliability #1065

Conversation

MonkeyDo commented Feb 16, 2024 • edited Loading

MonkeyDo commented Feb 16, 2024 •

edited

Loading